Wellington: a novel method for the accurate identification of digital genomic footprints from DNase-seq data
نویسندگان
چکیده
The expression of eukaryotic genes is regulated by cis-regulatory elements such as promoters and enhancers, which bind sequence-specific DNA-binding proteins. One of the great challenges in the gene regulation field is to characterise these elements. This involves the identification of transcription factor (TF) binding sites within regulatory elements that are occupied in a defined regulatory context. Digestion with DNase and the subsequent analysis of regions protected from cleavage (DNase footprinting) has for many years been used to identify specific binding sites occupied by TFs at individual cis-elements with high resolution. This methodology has recently been adapted for high-throughput sequencing (DNase-seq). In this study, we describe an imbalance in the DNA strand-specific alignment information of DNase-seq data surrounding protein-DNA interactions that allows accurate prediction of occupied TF binding sites. Our study introduces a novel algorithm, Wellington, which considers the imbalance in this strand-specific information to efficiently identify DNA footprints. This algorithm significantly enhances specificity by reducing the proportion of false positives and requires significantly fewer predictions than previously reported methods to recapitulate an equal amount of ChIP-seq data. We also provide an open-source software package, pyDNase, which implements the Wellington algorithm to interface with DNase-seq data and expedite analyses.
منابع مشابه
Corrigendum: Comparative evaluation of DNase-seq footprint identification strategies
DNase I is an enzyme preferentially cleaving DNA in highly accessible regions. Recently, Next-Generation Sequencing has been applied to DNase I assays (DNase-seq) to obtain genome-wide maps of these accessible chromatin regions. With high-depth sequencing, DNase I cleavage sites can be identified with base-pair resolution, revealing the presence of protected regions ("footprints"), correspondin...
متن کاملExplicit DNase sequence bias modeling enables high-resolution transcription factor footprint detection
DNaseI footprinting is an established assay for identifying transcription factor (TF)-DNA interactions with single base pair resolution. High-throughput DNase-seq assays have recently been used to detect in vivo DNase footprints across the genome. Multiple computational approaches have been developed to identify DNase-seq footprints as predictors of TF binding. However, recent studies have poin...
متن کاملDNaseR: DNase I footprinting analysis of DNase-seq data
The combination of DNase I digestion and high-throughput sequencing (DNaseseq) has been used recently to map chromatin accessibility in a given tissue or cell type on a genome-wide scale (Song and Crawford, 2010). In addition to DNase I hypersensitive sites (DHSs), short regions of protected nucleotides known as footprints can be detected using a technique known as ”digital genomic footprinting...
متن کاملMost brain disease-associated and eQTL haplotypes are not located within transcription factor DNase-seq footprints in brain
Dense genotyping approaches have revealed much about the genetic architecture both of gene expression and disease susceptibility. However, assigning causality to genetic variants associated with a transcriptomic or phenotypic trait presents a far greater challenge. The development of epigenomic resources by ENCODE, the Epigenomic Roadmap and others has led to strategies that seek to infer the l...
متن کاملOn Accounting for Sequence-Specific Bias in Genome-Wide Chromatin Accessibility Experiments: Recent Advances and Contradictions
Uncovering the protein–DNA interactions involved in cell fate, development, and disease in a timeand cell-specific manner is a fundamental goal of molecular biology. The advent of the sequencing technologies has opened a new genomic era, uncovering the information encoded in genomes, epigenomes, and transcriptomes (McPherson, 2014). For example, the popular ChIPbased techniques ChIP-seq (Johnso...
متن کامل